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ABSTRACT 

This paper discusses scalability of standard genetic program- 
ming (GP) and the probabilistic incremental program evolu- 
tion (PIPE). To investigate the need for both effective mix- 
ing and linkage learning, two test problems are considered: 
ORDER problem, which is rather easy for any recombination- 
based GP, and TRAP or the deceptive trap problem, which 
requires the algorithm to learn interactions among subsets 
of terminals. The scalability results show that both GP and 
PIPE scale up polynomially with problem size on the sim- 
ple ORDER problem, but they both scale up exponentially on 
the deceptive problem. This indicates that while standard 
recombination is sufficient when no interactions need to be 
considered, for some problems linkage learning is necessary. 
These results are in agreement with the lessons learned in 
the domain of binary-string genetic algorithms (GAs). Fur- 
thermore, the paper investigates the effects of introducing 
unnecessary and irrelevant primitives on the performance of 
GP and PIPE. 

Categories and Subject Descriptors 

1.2.8 [Artificial Intelligence]: Problem Solving, Control 

Methods, and Search; 

1.2.6 [Artificial Intelligence]: Learning; 

G.1.6 [Numerical Analysis]: Optimization 

Keywords 

Genetic programming, PIPE, scalability, order problem, trap 
problem 

1. INTRODUCTION 

To solve large and complex problems, scalability is among 
the primary concerns of an optimization practitioner. How- 
ever, only few studies |18l I19| exist that study scalabil- 



ity in genetic programming (GP) 8]. The same holds for 
simple approaches to using probabilistic recombination in 
GP within the estimation of distribution algorithm (EDA) 
framework |12l 151 IH|. such as the probabilistic incremental 
program evolution (PIPE) 17 . 

The purpose of this paper is to study the scalability of stan- 
dard GP and PIPE on two decomposable GP problems: 
ORDER and TRAP. The two algorithms perform as expected 
and they solve ORDER scalably while failing to scale up on 
TRAP. Additionally, the paper studies the effects of intro- 
ducing unnecessary and irrelevant primitives. Both GP and 
PIPE are shown to deal with these two sources of diffi- 
culty well. The results presented in this paper confirm that 
binary-string GAs have a lot in common with GP and PIPE, 
and thus the lessons learned in the design, study, and ap- 
plication of standard GAs and their extensions should carry 
over to GP as argued for example in |6l ll8irH?| . 

The paper starts by describing the algorithms investigated 
in this paper: GP and PIPE. Section [3] explains test prob- 
lems. Section 0]provides and discusses experimental results. 
Section presents important topics for future work in this 
line of research. Section HJ summarizes the paper. Finally, 
Section [7| concludes the paper. 

2. METHODS 

Both GP and PIPE work with programs encoded as labeled- 
tree structures and both can be applied to the same class 
of problems. While GP generates new candidate programs 
using standard variation operators, such as crossover and 
mutation, PIPE builds and samples a probabilistic model in 
the form of a tree of mutually independent nodes. Therefore, 
the difference between GP and PIPE is in their variation 
operator (see Figure 0. 

This section describes GP and PIPE. The section starts by 
discussing standard GP and closes by describing the proba- 
bilistic algorithm PIPE. 

2.1 Genetic Programming 

Genetic programming (GP) |S] is a genetic algorithm (GA) 
pi] that evolves programs instead of fixed-length strings. 
Programs are represented by trees where nodes represent 
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Figure 1: Standard genetic programming (GP) and the probabilistic incremental program evolution (PIPE). 



functions and leaves represent variables and constants. 

GP starts with a population of random candidate programs. 
Each program is evaluated on a given task and its fitness 
value is assigned. A population of promising programs is 
then selected using one of the standard GA selection oper- 
ators, such as tournament or truncation selection. Some 
of the selected programs can be directly copied into the 
new population, the remaining ones are copied after ap- 
plying variation operators, such as crossover and mutation. 
Crossover usually proceeds by exchanging randomly selected 
subtrees between two programs, whereas mutation usually 
replaces a randomly selected subtree of a program by a ran- 
domly generated one. This process is repeated until termi- 
nation criteria are met. 

Since standard GP variation operators proceed without con- 
sidering interactions between different components of se- 
lected programs, they are likely to experience difficulties 
with solving problems where different program components 
interact strongly. However, problems that can be decom- 
posed into subproblems of order one should be easy for any 
standard GP based on recombination. This intuition is ver- 
ified with experiments in Section 3] Similar behavior can 
be observed in GAs; GAs with standard variation operators 
work great on problems with no interactions between deci- 
sion variables |13llTll5). but they often fail for problems with 
highly interacting decision variables |2L)ll5|. 

We implemented GP using the lilgp GP library devel- 
oped by the Genetic Algorithms Research and Applications 
Group (GARAGe) at the Michigan State University. 

2.2 PIPE 

In the probabilistic incremental program evolution (PIPE) 
algorithm |16l I17| computer programs or mathematical ex- 
pressions are evolved like in GP However, pairwise 
crossover and mutation are replaced by building a proba- 



bilistic model of promising programs and sampling the model. 

Like GP, PIPE represents programs by labeled trees where 
each internal node represents a function and each leaf repre- 
sents a variable or a constant. The initial population is also 
generated at random. All programs in the population are 
then evaluated and selection is applied to select the popu- 
lation of promising programs. Instead of applying crossover 
and mutation to a part of the selected population to generate 
new programs, PIPE now builds a probabilistic model of the 
selected programs in the form of a tree. This probabilistic 
model is then sampled to generate new candidate programs 
that form the new population. The process is repeated until 
the termination criteria are met. 

Next, the methods for learning and sampling the probabilis- 
tic model in PIPE are described. 

2.2.1 Learning the Probabilistic Model 
The probabilistic model in PIPE is a tree with the structure 
corresponding to the structure of candidate programs. Since 
different programs may be of different structure and size, 
the population is first parsed to find the smallest tree that 
contains every structure in the selected population. Each 
node of a program in the selected population then directly 
corresponds to one node in the model, whereas the children 
of each internal node represent arguments of the function in 
this node. Figure |2] illustrates probabilistic models used in 
PIPE. 

If there are functions of different arities, the number of chil- 
dren of each node in the probabilistic model is equal to the 
maximum arity of a function in this node in the selected 
population. For a function of smaller arity, the first children 
are interpreted as arguments of this function (in an arbitrary 
fixed ordering). 

PIPE then parses the selected population and computes 
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Figure 2: A probabilistic model of a population of 
programs in the form of a tree with nodes represent- 
ing the probabilities of functions and terminals. All 
nodes are modeled independently. 



the probabilities of different functions and terminals in each 
node of the probabilistic model. The nodes of the probabilis- 
tic model thus consist of tables of probabilities, and there is 
one probability for each function or terminal in each node. 

2.2.2 Sampling the Probabilistic Model 
Sampling of the probabilistic model starts in the root of 
the probabilistic model. The same recursive procedure is 
used to generate each node. First, a function or terminal 
is generated in the current node based on the distribution 
encoded by the table of probabilities in this node. If the 
function requires several arguments, a necessary number of 
children are generated recursively. The recursive generation 
terminates in a node whenever a terminal is generated in 
this node and thus no children have to be generated. Since 
the probabilistic model is built from an actual population of 
programs, the sampling will never cross the boundaries of 
the model. 

Using the probabilistic model of PIPE to model and sam- 
ple candidate programs resembles the univariate marginal 
distribution algorithm (UMDA) |12l H|. which models each 
string position independently of the values in other posi- 
tions. Interactions between each node and its context are 
ignored. That is why it can be expected that using this 
model will lead to inferior results on problems where pro- 
gram components interact strongly, similarly as the univari- 
ate model generally fails if string positions interact |2(J| . On 
the other hand, if different program components are mutu- 
ally independent, PIPE should work great. This intuition is 
verified with experiments in Section 0] 

We implemented PIPE by incorporating probabilistic re- 
combination into the lilgp library developed by GARAGe 
at the Michigan State University. 

3. TEST PROBLEMS 

In order to test scalability, we need a class of problems where 
size can be modified while the inherent problem difficulty 
does not grow prohibitively fast. In fixed-length string GAs, 
decomposable problems of bounded difficulty 5' can be used 



as a challenging but solvable class of problems. Two types 
of decomposable problems for fixed-length string GAs are 
common: Onemax and concatenated traps. In onemax, the 
contribution of each bit is independent of its context. On 
the other hand, in concatenated traps, bits in each trap par- 
tition interact and cannot be effectively processed without 
considering other bits in the same trap partition. 

Similar problems to onemax and concatenated traps were 
also created for GP where candidate solutions are repre- 
sented by program trees I18| . Two classes of problems 
from |18| are considered: 



1. ORDER: OneMax-like, GP-easy problem. 

2. TRAP: Deceptive-trap-like, GP-difficult problem 



ORDER should be easy for any recombination-based GP. How- 
ever, since standard variation operators do not consider in- 
teractions between different program components, TRAP can 
be expected to lead to exponential scalability of both stan- 
dard GP and PIPE. The problems are described next. 

3.1 Problem 1: Order 

The primitive set of an /-primitive ORDER problem consist of 
a binary function JOIN and complimentary terminals X; and 
Xi for i £ {1, 2, . . . , I}. A candidate solution of the ORDER 
problem is a binary tree with JOIN in all internal nodes and 
either AVs or AVs at its leaves. The candidate solution's 
output is determined by parsing the program tree inorder 
(from left to right). The program expresses Xi if, during the 
inorder parse, Xi is encountered before its complement Xi 
and neither Xi nor its complement are encountered earlier. 
For all i £ {1, 2, . . . , /}, if Xi is unexpressed, Xi is expressed 
instead. One terminal is thus expressed from each pair Xi 
and Xi. 

For all i £ {1,2,...,/}, an equal unit of fitness value is 
accredited if Xi is expressed: 



fl(Xi) = { 



if Xi € {Xi,X 2 , 
otherwise 



The fitness function for ORDER is defined as 

l 



(1) 



(2) 



where x is the set of primitives expressed by the program. 
Given that trees can be sufficiently large, the expression for 
a globally optimal solution of an /—primitive ORDER problem 
is {Xi, X2, ■ ■ ■ , Xi} and thus its fitness value is I. 

For example, consider a candidate solution for a 4-primitivc 
ORDER problem shown in Figure [3] The sequence of leaves 
visited during the inorder parse is {A3, Xi, X\, X2, X4, A3}, 
the expression of this sequence is {Ai, A2, A3, A4}, and the 
fitness of this solution is thus 2. 

3.2 Problem 2: Deceptive Trap 

In standard GAs, deceptive functions 013 are designed to 
thwart the very mechanism of selectorecombinative search 




Figure 3: A candidate solution for a 4-primitive 
ORDER problem. The output of the program is 
{Xi , Xii A3, X4} and the fitness of this solution is thus 
2. 



by punishing any localized hillclimbing and requiring mixing 
of whole building blocks at or above the order of deception. 
Using such adversarially designed functions is a stiff test — 
in some sense the stiffest test — of algorithm performance. 
The idea is that if an algorithm can beat an adversarially 
designed test functions, it can solve other problems that 
are equally hard or easier than the adversary. Furthermore, 
if the building blocks of such deceptive functions are not 
identified and respected by selectorecombinative GAs, then 
they almost always converge to the local minimum. 

TRAP is designed to test the same mechanisms in GP. Fitness 
is computed so that if interactions between different compo- 
nents of the program are not considered, optimization may 
be mislead away from the global optimum. Similarly as 
with standard GAs on deceptive functions, standard GP is 
expected to fail in solving TRAP scalably, indicating the need 
for linkage learning in GP. 

Programs in TRAP also consist of one binary function JOIN 
and I pairs of complementary primitives Xi and Xi. The 
expression mechanism of the program for TRAP is identical 
to that to that of ORDER. The difference is in the fitness 
evaluation procedure. 

In TRAP, the expressed set of primitives is first mapped to an 
/-bit binary string. The ith bit of the string is 1 if and only 
if Xi was expressed; otherwise, the ith bit of the string is 0. 
The resulting binary string is then partitioned into groups of 
k bits each (the partitioning is fixed during the entire run) 
and a trap function is applied to each group: 

/*(«)={ (LO _$)(!_ u<k (3) 

where u is the number of ones in the input string of k bits. 

The fitness function of the trap function is then computed 
by adding the contributions of all groups of k bits together. 

The difficulty of trap can be adjusted by modifying the 
values k, and 8. The problem becomes more difficult as the 
value of k is increased and that of 5 is decreased. A 4-bit 
deceptive trap function is illustrated in Figure 31 in this 
paper we use traps with k = 3 and 5 = 1. 




Number of ones, u 

Figure 4: A fully deceptive trap function with k = 4, 

and 8 = 0.25. 

The important feature of additively separable trap functions 
is that if looking at the performance of any subset of k bits 
corresponding to one trap, it seems to be better to propa- 
gate 0s (here we need to eliminate XiS and substitute Xi or 
nothing). As shown in |18|. if interactions between different 
components of the program are not considered, it can be 
expected that GP will scale up poorly on this problem. 

3.3 Other primitives 

In addition to ORDER and TRAP with JOIN and I terminal 
pairs, we tested GP and PIPE on ORDER with additional 
two primitives: A primitive negative join and junk or un- 
expressed terminals. The purpose of additional tests was 
to determine how GP and PIPE respond to more complex 
interactions and unnecessary program primitives. 

3.3.1 Primitive negative-join 

NEG_J0IN affects all its descendant terminals by expressing 
each primitive Xi as its negation Xi; analogically, all de- 
scendants Xi are expressed as Xi. If a terminal has more 
NEG_J0IN ancestors, only one of them is considered and the 
terminal is negated only once. 

NEG_J0IN is unnecessary for solving ORDER and it does not 
introduce a less complex or easier to find global optimum. 
Furthermore, NEG_J0IN introduces interactions into ORDER 
because the best value in each leaf depends on its ancestors. 
Nonetheless, these interactions are relatively simple as many 
leaves are expected to contain NEG_J0IN on the path to the 
root. 

For example, for the program shown in Figure^] the inorder 
pass through the program results in the following sequence 
of leaves: {A3, X\ , X\ , X2, A4, A3}. The expression gives 
us {Ai, X2, A3, A4}, and thus the fitness is 3. 

3.3.2 Junk-code terminals 

Junk-code or JUNK terminals represent unnecessary primi- 
tives that are irrelevant for the particular problem. In bi- 
ological terms, JUNK terminals correspond to junk code in 
DNA. During the expression phase, JUNK terminals are sim- 




Figure 5: A candidate solution for 4-primitive ORDER 
problem with NEG_J0IN. The output of the program 
is {Xl , X%, A3, A4}, and the fitness of this solution is 
thus 3. 




Figure 6: A candidate solution for 4-primitive ORDER 
problem with JUNK terminals. The output of the pro- 
gram is {Xi, X2, A3, A4}, and the fitness of this solu- 
tion is thus 2. 

ply ignored and they thus do not influence the overall fitness 
at all. 

Adding JUNK terminals makes the optimization problem more 
difficult, because additional primitives enlarge the search 
space without simplifying the problem. The influence of 
JUNK terminals can be tuned by changing the number of 
unique JUNK terminals. 

Figure shows a tree with two JUNK terminals. The in- 
order parse results in the following sequence of leaves (ig- 
noring JUNK): {A3, X 1, X2, A4}. The expression gives us 
{Xi, X2, A3, X4}, and thus the fitness of this solution is 2. 

4. EXPERIMENTS 

This section compares the performance of GP and PIPE on 
three variants of ORDER and one variant of TRAP. 

4.1 Description of experiments 

The scalability of GP and PIPE was tested on four classes 
of problems: 

(i) Basic ORDER (no JUNK or NEG.JOIN), 

(ii) basic TRAP (no JUNK or NEG.JOIN), 

(iii) ORDER with NEG.JOIN, and 
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Figure 7: Scalability of GP and PIPE on ORDER. 

(iv) ORDER with JUNK terminals, where the number of unique 
JUNK terminals is set to 1/5. 

The scalability experiments were performed by testing both 
algorithms on problem instances with an increasing number 
of primitives. 

Additionally, the effects of increasing the number of unnec- 
essary primitives on the performance of GP and PIPE were 
studied by testing GP and PIPE on a 20-primitive ORDER 
with an increasing number of JUNK terminals (from 5 to 40) . 

Binary tournament selection was used in both GP and PIPE. 
The probability of crossover in GP is set to 1.0. To focus 
on the effects of recombination, no mutation is used. The 
initial population in both methods was generated using the 
standard half-and-half method. Maximum tree depth was 
set to be one more than the depth of the minimum tree to 
store the global optimum. The population size that is within 
10% of the minimum population size required to solve 30 
independent runs is used. The population size is determined 
using a bisection method. The runs are terminated when the 
algorithms find the global optimum or when the number of 
generations is too large for the particular problem. 

4.2 Results 

Figure Q shows the scalability of GP and PIPE on ORDER 
without NEG_J0IN or JUNK terminals. Problem instances of 
different size were examined; more specifically, I = 5, 10, 20, 
40, 60, 80, and 100. The figure shows the average number 
of function evaluations of 30 successful runs with respect to 
the problem size (number of positive literals). The results 
indicate that PIPE is slightly more efficient than GP but 
both GP and PIPE scale up with a low-order polynomial. 
These results are in agreement with the behavior observed 
in binary-string GAs on the simple onemax problem. On 
onemax, both simple GA and UMDA find the optimum in 
low-order polynomial time |13l |7| |5J I15| : however, UMDA 
performs slightly better |15] because it uses a more effective 
recombination for this type of problems. 

Figure [§] compares the scalability of GP and the PIPE on 





4 8 16 32 64 

Number of terminals 



Figure 9: Scalability of GP and PIPE on ORDER with 
NEG.JOIN. 



TRAP without NEG.JOIN or JUNK terminals. The size of one 
trap is k = 3 and the signal difference is d = 1. Prob- 
lem instances of different size were examined; more specif- 
ically, I = 6, 12, 18, 21, 24, and 33. On TRAP, GP per- 
forms slightly better than PIPE. This can be explained by 
its weaker recombination operator because here recombina- 
tion causes disruption of important partial solutions |2U| as 
can be hypothesized based on the performance of standard 
GAs on similar problems. Nonetheless, both GP and PIPE 
scale up poorly and they indicate an exponential growth of 
the number of function evaluations with problem size. 

Figure [§] compares the scalability of GP and PIPE on ORDER 
with NEG_J0IN. Problem instances of different size were ex- 
amined; more specifically, / = 5, 10, 20, 40, 60, 80, and 
100. Both GP and PIPE perform similarly as on basic 
ORDER without NEG_J0IN, but there is a slight decrease in 
their performance because of the interactions introduced by 
NEG.JOIN. 

FigureEHlcompares the scalability of GP and PIPE on ORDER 
with I /5 unique JUNK terminals. For example, a problem in- 



stance with I — 20 positive terminals contains 4 unique JUNK 
terminals. Both GP and PIPE seem to be capable of deal- 
ing with these irrelevant terminals and achieve performance 
comparable to that on basic ORDER. 

The last two sets of experiments are similar in that they 
show how the performance of GP and PIPE changes when 
adding irrelevant terminals into the representation. ORDER 
with I = 20 terminals is used with the number of JUNK ter- 
minals ranging from 5 to 40 (5, 10, 15, 20, and 40). The ex- 
periments differ in the bound on the maximum tree depth. 
Figure 1111 shows the results with the depth limited to at 
most 6 (so there are at most 7 levels including the root). 
Figure 1121 shows the results with the depth limited to at 
most 7 (so there are at most 8 levels including the root). 
The problem with the smaller maximum depth is more diffi- 
cult for both GP and PIPE because JUNK terminals obstruct 
the creation of an optimal solution that is only slightly larger 
than the maximum allowed tree. PIPE deals better with this 
"lack of space" than GP does. However, in both cases, the 
number of evaluations still appears to grow with a low-order 
polynomial or slower as irrelevant terminals are added. 

5. FUTURE WORK 

Future work should study the scalability of GP, PIPE, and 
other similar approaches on the problems presented in this 
paper and other problems where problem size can be mod- 
ified without affecting the inherent problem difficulty. The 
efforts to introducing linkage learning into GP (for example 
|18l |2]) should continue to succeed in the design of robust 
GP methods that provide a scalable solution to broad classes 
of GP problems. Finally, more theory should be designed 
to match the achievements in this area in the domain of 
GAs pil^0l l5l ll31inilTu1| . 

6. SUMMARY 

This paper focused on the scalability of two GP algorithms: 
standard GP and PIPE. 

Two basic test functions were used: ORDER and TRAP. Both 
functions were defined using one binary function JOIN and I 




Figure 11: Scalability of GP and PIPE on ORDER with 
an increasing number of JUNK terminals (from 5 to 
40). The maximum depth of candidate programs is 
6. 



Figure 12: Scalability of GP and PIPE on ORDER with 
an increasing number of JUNK terminals (from 5 to 
40). The maximum depth of candidate programs is 
7. 



complementary terminal pairs Jf; and Xi for i € {1, 2, . . . , /}. 
ORDER can be solved without considering interactions be- 
tween different program components, whereas TRAP intro- 
duces strong interactions, which make this function difficult 
for both standard crossover and mutation of GP, as well as 
the probabilistic recombination of PIPE. 

The scalability of GP and PIPE was tested on basic ORDER 
and TRAP. Additionally, ORDER was extended by adding ei- 
ther of the following two primitives: (1) a binary function 
NEG.JOIN and (2) JUNK (or irrelevant) terminals. Thus, there 
were 4 problem types examined. 

On all four problem types, the scalability of GP and PIPE 
was first tested by applying these algorithms to problem 
instances of different size (number I of positive terminals). 
Then, the sensitivity of GP and PIPE to the proportion of 
irrelevant terminals to the relevant ones was examined. 

7. CONCLUSIONS 

The results presented in this paper indicate that the be- 
havior of different variants of GP can be expected to be 
similar to that of standard binary-string GAs. There are 
two important consequences of this fact. First, as it was 
indicated in [18| . to solve some classes of problems scalably, 
linkage learning may have to be incorporated into GP in 
order to identify and exploit interactions between different 
program components. Second, the lessons learned in the de- 
sign and application of binary-string GAs should carry over 
to GP as argued for example in |51 I19|: the first steps along 
this direction are represented by the decision-making model 
of the population sizing in GP I19| . which was based on 
the decision-making population-sizing model for standard 
GAs 00. 

The results also indicate that if the recombination operator 
captures interactions in the problem properly, increasing the 
mixing effects of recombination leads to better performance. 
That is why PIPE outperformed standard GP on problems 
where program components could be treated independently. 



This fact together with the need for linkage learning should 
encourage the application of probabilistic recombination op- 
erators of estimation of distribution algorithms (ED As) |12l 
114) to the domain of GP. Some representatives of ED As 
applied to the GP domain are (To1[TTl[T51 l3). 

Finally, the results show that both GP and PIPE can deal 
with irrelevant terminals and unnecessary functions rela- 
tively well and their performance gets only slightly worse 
when adding these primitives. 
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